医生经常基于患者的图像扫描,例如磁共振成像(MRI),以及患者的电子健康记录(EHR),如年龄,性别,血压等。尽管在计算机视觉或自然语言研究领域的图像或文本分析中提出了大量的自动方法,但已经为医学图像的融合和医疗问题的EHR数据进行了更少的研究。在现有的早期或中间融合方法中,两种方式的特征串联仍然是一个主流。为了更好地利用图像和EHR数据,我们提出了一种多模态注意力模块,该模块使用EHR数据来帮助选择传统CNN的图像特征提取过程期间的重要区域。此外,我们建议将多头Machnib纳入门控多媒体单元(GMU),使其能够在不同子空间中平行熔断图像和EHR特征。在两个模块的帮助下,可以使用两个模态增强现有的CNN架构。预测脑内出血患者的Glasgow结果规模(GOS)和分类Alzheimer病的实验表明,该方法可以自动关注任务相关领域,并通过更好地利用图像和EHR功能来实现更好的结果。
translated by 谷歌翻译
Many real-world problems are usually computationally costly and the objective functions evolve over time. Data-driven, a.k.a. surrogate-assisted, evolutionary optimization has been recognized as an effective approach for tackling expensive black-box optimization problems in a static environment whereas it has rarely been studied under dynamic environments. This paper proposes a simple but effective transfer learning framework to empower data-driven evolutionary optimization to solve dynamic optimization problems. Specifically, it applies a hierarchical multi-output Gaussian process to capture the correlation between data collected from different time steps with a linearly increased number of hyperparameters. Furthermore, an adaptive source task selection along with a bespoke warm staring initialization mechanisms are proposed to better leverage the knowledge extracted from previous optimization exercises. By doing so, the data-driven evolutionary optimization can jump start the optimization in the new environment with a strictly limited computational budget. Experiments on synthetic benchmark test problems and a real-world case study demonstrate the effectiveness of our proposed algorithm against nine state-of-the-art peer algorithms.
translated by 谷歌翻译
单眼深度估计和散焦估计是计算机视觉中的两个基本任务。大多数现有方法将深度估计和散焦估计视为两个独立的任务,忽略了它们之间的牢固联系。在这项工作中,我们提出了一个由编码器组成的多任务学习网络,该网络具有两个解码器,以估算单个集中图像的深度和散焦图。通过多任务网络,深度估计促进了散焦估计,从而在弱纹理区域中获得更好的结果,而散焦估计促进了通过两个地图之间强烈的物理连接的深度估计。我们设置了一个数据集(名为All-3D数据集),该数据集是第一个由100K的全焦点图像组成的全真实图像数据集,具有焦点深度,深度图和Defocus映射的集中图像。它使网络能够学习深度和真实散焦图像之间的功能和固体物理连接。实验表明,与合成的图像相比,网络从实际集中图像中学习更多的固体特征。从这种多任务结构中受益,不同的任务相互促进,我们的深度和散焦估计的性能明显优于其他最新算法。代码和数据集将在https://github.com/cubhe/mddnet上公开可用。
translated by 谷歌翻译
缺乏标记的培训数据是许多应用程序中机器学习的瓶颈。为了解决瓶颈,一个有希望的方向是数据编程方法,该方法汇总了弱监督信号的不同来源,以轻松生成标记的数据。数据编程使用标签功能(LF)编码每个弱监督源,这是一个预测嘈杂标签的用户提供的程序。生成的标签的质量取决于标签聚合模型,该模型汇总了所有LFS的所有嘈杂标签以推断地面真相标签。现有的标签聚合方法通常依赖于各种假设,并且在整个数据集中都不强大,因为我们将在经验上显示。我们首次提供了一种分析标签聚合方法,该方法是最小化假设的,并且在最小化某种形式的平均预测误差方面是最佳的。由于分析形式的复杂性是指数级的,因此我们训练一个学会成为分析方法的模型。经过训练后,该模型可用于任何看不见的数据集,该模型可以在线性时间内单个正向通行证中每个数据集的地面真相标签。我们显示该模型可以使用合成生成的数据进行训练,并为模型设计有效的体系结构。在14个现实世界数据集上,我们的模型在准确性(平均为3.5点)和效率(平均降低六倍)方面大大优于现有方法。
translated by 谷歌翻译
该点扩散函数(PSF)在许多计算成像应用中起着至关重要的作用,例如焦点/散焦,深度估计和荧光显微镜的形状。但是,散焦过程的数学模型尚不清楚。在这项工作中,我们开发了一种替代方法来估计点扩散函数的精确数学模型来描述散焦过程。我们首先得出PSF的数学算法,该算法用于生成不同的焦点深度的模拟聚焦图像。然后,我们计算模拟的聚焦图像与真实聚焦图像之间的相似性损耗函数,在该图像中,我们根据Docus直方图设计了一种新颖有效的度量,以评估聚焦图像之间的差异。在解决损耗函数的最小值后,这意味着我们找到了PSF的最佳参数。我们还构建了一个由聚焦系统和结构化的光系统组成的硬件系统,以获取全焦点图像,具有相应焦点深度的聚焦图像以及相同视图中的深度图。作为数据集的三种类型的图像用于获得精确的PSF。我们对标准平面和实际对象的实验表明,所提出的算法可以准确描述散焦过程。通过评估实际集中图像之间的差异,即我们的算法生成的焦点图像,即其他人生成的焦点图像,进一步证明了我们算法的准确性。结果表明,我们算法的损失平均比其他算法少40%。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译